培训阶段是机器学习过程中最重要的阶段。在标记数据和监督学习的情况下,机器培训包括最小化受不同约束的损失功能。在一个抽象的设置中,它可以制定为多个标准优化模型,其中每个标准测量与特定输入及其标签相关联的输出之间的距离。因此,拟合项是载体函数,并且其最小化旨在在帕雷托感测。我们为输入和输出数据的扰动提供了有效解决方案的稳定性结果。然后,我们将相同的方法扩展到使用多个数据集学习的情况。由于选择特定训练集而减少偏差,多个数据集环境是相关的。我们提出了一种使用MNIST数据在数字分类中实现该模型和数值实验的标准方法。
translated by 谷歌翻译
深入学习方法对于他们的表演而闻名,但它们缺乏可解释性可以防止他们从高赌注环境中。最近的模型不可知论方法通过逆向工程模型的内部工作来提供HOC解释性方法来解决这个问题。但是,在许多规范的领域中,应该从一开始就牢记解释性,这意味着后HOC方法仅作为模型培训后的理智检查。从开始,在一个抽象设置中的解释性意味着通过注入知识和可能的偏差来构成模型行为的一组软限制。我们提出了一种多种铁路技术技术,允许通过在目标函数中注入知识来控制模型结果的特征效果。然后,我们通过包括非线性知识功能来扩展技术,以考虑更复杂的效果和局部缺乏知识。结果是一个深入的学习模型,体现了从一开始并与最近的法规对齐的可解释性。一种基于信用风险的实际经验示例,表明我们的方法可以创建能够克服源自数据稀缺的偏差的表演者且强大的模型。
translated by 谷歌翻译
随着我们社会中的数据增殖,大规模数据分析正在以指数速度增长。这种丰富的数据具有允许决策者在以前令人瞩目的情况下实现复杂模型的优点。同时,这种数据需要分布式思维方法。实际上,深度学习模型需要大量资源,并且需要分布式培训。本文介绍了分布式学习的多标准方法。我们的方法在其Chebyshev制定中使用加权目标编程方法来构建优化测试性定义性能指标的决策规则的集合。这种配方是有益的,因为它既是模型和度量不可行的,并为决策者提供可解释的输出。我们通过显示电力需求预测的实际应用来测试我们的方法。我们的结果表明,当我们允许数据集分离重叠时,我们的方法的性能一致高于整个数据集培训的基线模型。
translated by 谷歌翻译
学术研究和金融业最近引起了机器学习算法,因为他们的权力解决了复杂的学习任务。然而,在公司的默认预测领域,缺乏可解释性阻止了广泛采用了黑箱类型的模型。为了克服这一缺点并保持黑盒的高性能,本文依赖于模型 - 无症方法。累计的本地效果和福芙值用于塑造预测因子对默认可能性的影响,并根据其对模型结果的贡献进行排名。与三种标准判别模型相比,通过两个机器学习算法(极端梯度升压和前馈神经网络)实现了预测。结果表明,我们对意大利中小企业制造业的分析通过极端梯度提升算法从整体最高分类功率的优势,而不放弃丰富的解释框架。
translated by 谷歌翻译
Computational units in artificial neural networks follow a simplified model of biological neurons. In the biological model, the output signal of a neuron runs down the axon, splits following the many branches at its end, and passes identically to all the downward neurons of the network. Each of the downward neurons will use their copy of this signal as one of many inputs dendrites, integrate them all and fire an output, if above some threshold. In the artificial neural network, this translates to the fact that the nonlinear filtering of the signal is performed in the upward neuron, meaning that in practice the same activation is shared between all the downward neurons that use that signal as their input. Dendrites thus play a passive role. We propose a slightly more complex model for the biological neuron, where dendrites play an active role: the activation in the output of the upward neuron becomes optional, and instead the signals going through each dendrite undergo independent nonlinear filterings, before the linear combination. We implement this new model into a ReLU computational unit and discuss its biological plausibility. We compare this new computational unit with the standard one and describe it from a geometrical point of view. We provide a Keras implementation of this unit into fully connected and convolutional layers and estimate their FLOPs and weights change. We then use these layers in ResNet architectures on CIFAR-10, CIFAR-100, Imagenette, and Imagewoof, obtaining performance improvements over standard ResNets up to 1.73%. Finally, we prove a universal representation theorem for continuous functions on compact sets and show that this new unit has more representational power than its standard counterpart.
translated by 谷歌翻译
The open-radio access network (O-RAN) embraces cloudification and network function virtualization for base-band function processing by dis-aggregated radio units (RUs), distributed units (DUs), and centralized units (CUs). These enable the cloud-RAN vision in full, where multiple mobile network operators (MNOs) can install their proprietary or open RUs, but lease on-demand computational resources for DU-CU functions from commonly available open-clouds via open x-haul interfaces. In this paper, we propose and compare the performances of min-max fairness and Vickrey-Clarke-Groves (VCG) auction-based x-haul and DU-CU resource allocation mechanisms to create a multi-tenant O-RAN ecosystem that is sustainable for small, medium, and large MNOs. The min-max fair approach minimizes the maximum OPEX of RUs through cost-sharing proportional to their demands, whereas the VCG auction-based approach minimizes the total OPEX for all resources utilized while extracting truthful demands from RUs. We consider time-wavelength division multiplexed (TWDM) passive optical network (PON)-based x-haul interfaces where PON virtualization technique is used to flexibly provide optical connections among RUs and edge-clouds at macro-cell RU locations as well as open-clouds at the central office locations. Moreover, we design efficient heuristics that yield significantly better economic efficiency and network resource utilization than conventional greedy resource allocation algorithms and reinforcement learning-based algorithms.
translated by 谷歌翻译
When testing conditions differ from those represented in training data, so-called out-of-distribution (OOD) inputs can mar the reliability of black-box learned components in the modern robot autonomy stack. Therefore, coping with OOD data is an important challenge on the path towards trustworthy learning-enabled open-world autonomy. In this paper, we aim to demystify the topic of OOD data and its associated challenges in the context of data-driven robotic systems, drawing connections to emerging paradigms in the ML community that study the effect of OOD data on learned models in isolation. We argue that as roboticists, we should reason about the overall system-level competence of a robot as it performs tasks in OOD conditions. We highlight key research questions around this system-level view of OOD problems to guide future research toward safe and reliable learning-enabled autonomy.
translated by 谷歌翻译
Autoencoders are a popular model in many branches of machine learning and lossy data compression. However, their fundamental limits, the performance of gradient methods and the features learnt during optimization remain poorly understood, even in the two-layer setting. In fact, earlier work has considered either linear autoencoders or specific training regimes (leading to vanishing or diverging compression rates). Our paper addresses this gap by focusing on non-linear two-layer autoencoders trained in the challenging proportional regime in which the input dimension scales linearly with the size of the representation. Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods; their structure is also unveiled, thus leading to a concise description of the features obtained via training. For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders. Finally, while the results are proved for Gaussian data, numerical simulations on standard datasets display the universality of the theoretical predictions.
translated by 谷歌翻译
Profile extrusion is a continuous production process for manufacturing plastic profiles from molten polymer. Especially interesting is the design of the die, through which the melt is pressed to attain the desired shape. However, due to an inhomogeneous velocity distribution at the die exit or residual stresses inside the extrudate, the final shape of the manufactured part often deviates from the desired one. To avoid these deviations, the shape of the die can be computationally optimized, which has already been investigated in the literature using classical optimization approaches. A new approach in the field of shape optimization is the utilization of Reinforcement Learning (RL) as a learning-based optimization algorithm. RL is based on trial-and-error interactions of an agent with an environment. For each action, the agent is rewarded and informed about the subsequent state of the environment. While not necessarily superior to classical, e.g., gradient-based or evolutionary, optimization algorithms for one single problem, RL techniques are expected to perform especially well when similar optimization tasks are repeated since the agent learns a more general strategy for generating optimal shapes instead of concentrating on just one single problem. In this work, we investigate this approach by applying it to two 2D test cases. The flow-channel geometry can be modified by the RL agent using so-called Free-Form Deformation, a method where the computational mesh is embedded into a transformation spline, which is then manipulated based on the control-point positions. In particular, we investigate the impact of utilizing different agents on the training progress and the potential of wall time saving by utilizing multiple environments during training.
translated by 谷歌翻译
The recent emergence of new algorithms for permuting models into functionally equivalent regions of the solution space has shed some light on the complexity of error surfaces, and some promising properties like mode connectivity. However, finding the right permutation is challenging, and current optimization techniques are not differentiable, which makes it difficult to integrate into a gradient-based optimization, and often leads to sub-optimal solutions. In this paper, we propose a Sinkhorn re-basin network with the ability to obtain the transportation plan that better suits a given objective. Unlike the current state-of-art, our method is differentiable and, therefore, easy to adapt to any task within the deep learning domain. Furthermore, we show the advantage of our re-basin method by proposing a new cost function that allows performing incremental learning by exploiting the linear mode connectivity property. The benefit of our method is compared against similar approaches from the literature, under several conditions for both optimal transport finding and linear mode connectivity. The effectiveness of our continual learning method based on re-basin is also shown for several common benchmark datasets, providing experimental results that are competitive with state-of-art results from the literature.
translated by 谷歌翻译